Compressing video frames

ABSTRACT

A method includes generating first difference frames and compressing the first difference frames to form compressed difference frames. The compressed difference frames are decompressed to form decompressed difference frames, and the decompressed difference frames are used in the generation of the first difference frames.

BACKGROUND

The invention relates to compressing video frames.

Referring to FIG. 1, a digital imaging system 5 may include a digital camera 12 that electrically captures a digitized representation, or a pixel image, of an optical image 11. The pixel image typically is represented by a frame of data, and each portion or pixel, of the pixel image is represented by one or more bytes of the frame. Although the camera 12 may transmit the original, decompressed frame to a computer 14 via a bus 15 (a serial bus, for example) the camera 12 might first compress the frame before transmission due to the limited available bandwidth of the bus 15.

As an example of the bandwidth limitations, the bus 15 may be a Universal Serial Bus (USB), and the camera 12 may use an isochronous communication channel of the bus 15 to communicate with the computer 14. However, other devices (an infrared transceiver 16, for example, that communicates with an infrared keyboard 7 and a mouse 8, as examples) may also use the bus 15. Therefore, because multiple devices may use the bus 15, the camera 12 typically does not reserve all of the available bandwidth of the bus 15 but rather, reserves a reasonable portion (a bandwidth of 5 Mbits/sec., for example) of the available bandwidth. Otherwise, without this self-imposed restriction, the remaining available bandwidth of the bus 15 may be insufficient to support the communications required by the other bus devices.

However, the bandwidth reserved by the camera 12 may be insufficient to transmit decompressed frames. As an example, for video, the camera 12 may transmit thirty frames every second. In this manner, if the camera 12 transmits a frame of 352 columns by 288 rows thirty times every second and represents each pixel by eight bits, then the required bandwidth is 24,330,240 bits/second. Not only does this well exceed the limited bandwidth reserved for the camera 12, this requirement might also exceed the total available bandwidth of the bus 15. Thus, compression might be required to reduce the required bandwidth.

The compression of each frame by the camera 12 may require spatially filtering. In this manner, the pixel intensities of the image typically spatially vary across the image, and the rate at which these intensities vary is called the spatial frequency, which also varies across the image. As an example, the boundaries of objects generally introduce high spatial frequencies because the levels of the pixel intensities change rapidly near object boundaries.

In a technique called a wavelet transformation, the camera 12 may spatially filter the image in different directions to produce frequency sub-band images, and the camera 12 may then compress the data associated with the sub-band images. The transformation of the original pixel image into the frequency sub-band images typically includes spatially filtering the original pixel image (and thus, the associated data) in both vertical and horizontal directions. For example, referring to FIG. 2, a 9-7 bi-orthogonal spline filter may be used to filter an original pixel image 18 (having a resolution of 1280 columns by 960 rows, for example) to produce four frequency sub-band images 19 (each having a resolution of 640 columns by 480 rows, for example). Thus, the sub-band images 19 represent the image 18 after being spatially filtered along the vertical and horizontal directions.

Referring to FIG. 3, to compress the data associated with the original frame, the camera 12 may first transform (block 2) the corresponding pixel image into frequency sub-band images. After the transformation, the camera 12 may quantize (block 3) the data associated with the frequency sub-band images 19 to reduce the bit precision (and size) of the data and increase the number of zeros in the data. For example, the camera 12 might truncate the four least significant bits of each byte of the data. Therefore, for example, instead of each intensity value being represented by eight bits, each intensity value may instead be represented by four bits.

To complete the compression, the camera 12 may entropy encode (block 4) the quantized data. In entropy encoding, redundant data patterns are consolidated. Therefore, because the quantization increases the number of zeros in the data, the quantization typically enhances the effectiveness of the entropy encoding. As an example, one type of entropy encoding, called Huffman encoding, uses variable length codes to represent the data. The shorter codes are used to represent patterns of the data that occur more frequently, and the longer codes are used to represent patterns of the data that occur less frequently. By using this scheme, the total size of the data is reduced.

Once the computer 14 receives the compressed frame, the computer 14 may then follow the decompression steps described above in a reverse order in an attempt to reconstruct the original pixel image. Referring to FIG. 4, the computer 14 may perform (block 20) an inverse entropy function to reconstruct the quantized data for the sub-band images. The computer 14 may then perform (block 22) an inverse quantization function to reconstruct the data for the original sub-band images 19 and then perform (block 24) an inverse transform function on this data to reconstruct the original pixel image. Unfortunately, the precision lost by the quantization is not recoverable. For example, if in the quantization eight bits of data are quantized to produce four bits of data, the lost four bits are not recoverable when inverse quantization is performed. As a result, one problem with the above-described compression technique is that the quantization may introduce a large change in the intensity of a pixel after compression for a relatively small change in intensity of the pixel before compression.

For example, before quantization, a spatial intensity 6 (see FIG. 5) of pixels of a particular row of a frequency sub-band image may be close to a quantization threshold level (called I₁). For video, the intensity 6 may vary slightly from frame to frame due to artifacts, or noise, present in the system, and these changes may be amplified by the quantization. The amplified artifacts (an effect called scintillation) may cause otherwise stationary objects to appear to move in the reconstructed video. The human eye may be quite sensitive to this motion, and thus, the perceived quality of the video may be degraded.

Thus, there is a continuing need for a digital imaging system that reduces the amplification of artifacts in the compression/decompression process.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a schematic diagram of a digital imaging system of the prior art.

FIG. 2 is an illustration of a process to generate frequency sub-band images.

FIG. 3 is a flow diagram illustrating a process to compress a frame.

FIG. 4 is a flow diagram illustrating a process to decompress a frame.

FIG. 5 is a graph illustrating intensity values for pixels of a row.

FIG. 6 is a schematic diagram of a digital imaging system according to an embodiment of the invention.

FIGS. 7, 8, 9, 10 and 11 are timing diagrams illustrating the generation of the difference frames.

FIG. 12 is a schematic diagram of the camera of FIG. 6.

FIG. 13 is a block diagram of a forward wavelet transformation unit.

FIG. 14 is a schematic diagram of a cell of the unit of FIG. 13.

FIG. 15 is a schematic diagram of inverse wavelet transformation units.

FIG. 16 is a block diagram of a unit to perform both forward and reverse wavelet transformations.

FIGS. 17, 18, 19, 20, 21 and 22 illustrate waveforms of signals of the unit of FIG. 16 when the unit operates in a forward transformation mode.

FIGS. 23, 24, 25, 26, 27 and 28 illustrate waveforms of signals of the unit of FIG. 16 when the unit operates in the inverse transformation mode.

FIGS. 29 and 30 are flow diagrams illustrating operation of the unit of FIG. 16.

SUMMARY

In one embodiment, a method includes generating first difference frames and compressing the first difference frames to form compressed difference frames. The compressed difference frames are decompressed to form decompressed difference frames, and the decompressed difference frames are used in the generation of the first difference frames.

DETAILED DESCRIPTION

Referring to FIG. 6, an embodiment 27 of a digital imaging system in accordance with the invention includes a digital camera 26 which may capture video that is displayed by a computer 24. In this manner, the camera 26 may electrically capture successive pixel images to form captured frames of image data which indicate the pixel images. To reduce the bandwidth otherwise required to transmit the image data to the computer 24 (via a serial bus 29), the camera 26 uses techniques to compress the size of the transmitted frames.

In particular, instead of transmitting the frames, as captured, to the computer 24, the camera 26 may generate difference frames, compress the difference frames and then transmit the compressed difference frames to the computer 24. Each difference frame indicates the differences in pixel intensities (on a pixel-by-pixel basis) between two successive images that are captured by the camera 26. Because typically little motion occurs between two successively captured pixel images, the resultant difference frame may include a substantial amount of nonessential and redundant information that may be removed by compression. Thus, in general, the compressed difference frame is much smaller than any of the frames that are captured by the camera 26.

For purposes of reducing the potential amplification of artifacts by the compression/decompression process, the camera 26 decompresses a copy of each compressed difference frame (that is transmitted to the computer 24) in an attempt to reconstruct a frame as originally captured by the camera 26. However, instead of being an exact duplicate of the corresponding captured frame, the reconstructed frame indicates errors that are introduced by the compression/decompression process. As described below, the camera 26 uses the reconstructed frame in a feedback loop to minimize the artifacts introduced by the compression/decompression process.

The computer 24, as typical, already performs the function of reconstructing the captured frames from difference frames for purposes of displaying images on a monitor. In this manner, to reconstruct the next frame, the computer 24 decompresses a newly received difference frame and adds this difference frame to a previously reconstructed frame (the frame that indicates the image currently being displayed by the computer 24, for example). However, it has been discovered that without error compensation by the camera, errors (quantization errors, for example) may be amplified by the compression/decompression process. These amplified errors appear in the corresponding image that is displayed by the computer 24.

Referring also to FIGS. 7, 8, 9, 10 and 11, like the computer 24, the camera 26 reconstructs frames (called reconstructed frames 68) from compressed difference frames 64 (frames 64 a and 64 b, as examples). However, the camera 26 uses the reconstructed frames 68 in a feedback loop to minimize the amplification of errors in the compression/decompression process. The feedback loop is also used to form the difference frames.

As an example, the circuitry of the camera 26 may include a pixel image capture circuit 30 that captures pixel images to generate captured frames 60 (frames 60 a and 60 b, as examples) that indicate the corresponding captured pixel images. A frame adder circuit 35 (of the camera 26) forms the current difference frame 62 a by subtracting intensity values indicated by a prior reconstructed frame 68 b from the corresponding intensity values indicated by the current captured frame 60 a.

A decompression circuit 33 processes copies of compressed difference frames 64 which are generated by a compression circuit 31. The compression circuit 31 receives the difference frames 62 from the frame adder circuit 35. The decompression circuit 33 decompresses the copies of the compressed difference frames 64 to generate decompressed difference frames 66 (frames 66 a and 66 b, as examples).

A frame adder circuit 44 (of the camera 26) receives the decompressed difference frames 66 and generates the reconstructed frames 68. To accomplish this, the frame adder circuit 44 generates each new reconstructed frame 68 by combining the most recently decompressed difference frame 66 with the prior reconstructed frame 68 that was generated by the frame adder circuit 44. Thus, for example, the frame adder 44 generates the prior reconstructed frame 68 b (used to generate the current difference frame 62 b) by combining the reconstructed frame 68 c with the prior difference frame 66 b.

Referring to FIG. 12, in some embodiments, the pixel image capture circuit 30 includes optics 70 to focus an optical image onto a focal plane of an imager 72. The pixel image capture circuit 30 may also include a capture and signal processing unit 74. The capture and signal processing unit 74 interacts with the imager 72 to capture representations (i.e., pixel images) of optical images and transfers the resultant captured frames 60 (see FIG. 7) to a random access memory (RAM) 92. To accomplish this, the capture and signal processing unit 74 is coupled to a bus 96, along with a memory controller 94 which receives the frame from the bus 96 and generates signals to store the data in the memory 92.

In some embodiments, the compression 31 and decompression 33 circuits may be formed from a transformation/inverse transformation unit 76 and a quantization/inverse quantization unit 78. However, in other embodiments, the transformation, inverse transformation, quantization and inverse quantization functions may be performed by separate units. Other arrangements are also possible.

Besides performing the transformation functions for both compression and decompression, the unit 76 may retrieve and combine the captured 60 and reconstructed 68 frames to form the difference frames 64 and thus, perform the functions of the adder circuit 35. In performing the transformation (for the compression circuit 31) and inverse transformation (for the decompression circuit 33) functions, the unit 76 is coupled to the bus 96 to interact with the memory 92 to retrieve and store the appropriate frames. Similarly, in performing the quantization (for the quantization for the compression circuit 31) and inverse quantization (for the decompression circuit 33) functions, the unit 78 is coupled to the bus 96 to interact with the memory 92 to retrieve and store the appropriate frames.

An entropy encoding circuit 36 is coupled to the bus 96 to retrieve the compressed difference frames 64 from the memory 92 and entropy encode these frames before returning the entropy encoded difference frames to the memory 92. A serial bus interface 38 is coupled to the bus 96 to retrieve the entropy encoded difference frames and generate signals on the bus 96 to transmit the frames to the computer 24 via the bus 29.

In some embodiments, the camera 26 may also include a flash memory 84 to store frames for still pixel images that are captured by the camera 26. The flash memory 84 is accessed via a flash memory controller 86 which coupled between the flash memory 84 and the bus 96. The camera 26 may also include a microprocessor 90 to coordinate the activities of the camera 26. The microprocessor 90 may be coupled to the bus 96 via a bus interface 88.

In some embodiments, the unit 76 may include a forward wavelet transformation unit 77 (see FIG. 13) and reverse wavelet transformation units 300 and 350 (see FIG. 15). Referring to FIG. 13, the forward wavelet transformation unit 77 receives an input x_(i) 100 which is a single N-bit integer or floating point value from an input sequence representing discrete samples of an input signal that is to be encoded. In order for the forward wavelet transformation unit 77 to be initialized, inputs x₁, x₂, x₃, and x₄ propagate through delay elements 110, 112, 114, and 116. Thereafter, for every x_(i) 100 that is received by the unit 77, two outputs are generated by unit 77. These two outputs are shown in FIG. 13 as a_(i-4) 150, which is a low frequency sub-band (LFS) output and, c_(i-3) 160, which is a high frequency sub-band (HFS) output.

Each of the basic processing cells 121, 122, 124, 126 and 128 of the unit 77 include an adder and a multiplier and produce two sets of intermediate outputs. The first set of intermediate outputs, L₀ from cell D₀ 121, L₁ from cell D₁ 122, L₂ from cell D₂ 124, L₃ from cell D₃ 126, and L₄ from cell D₄ 128, are added by an adder 130 to generate the LFS values a_(i-4) 150. Likewise, the second set of intermediate outputs, M₁ from cell D₁ 122, M₂ from cell D₂ 124, M₃ from cell D₃ 126, and M₄ from cell D₄ 128, are added together by an adder 140 to generate the HFS values c_(i-3) 160. To generate the first HFS output c₀, which occurs at i=3 (i=0, 1, 2 . . . ), the first four cycles and, consequently, the first three intermediate outputs, L₁, L₂, and L₃ are required. On the next clock cycle, at i=4, the LFS output a₀ is generated and observed. Thus, the outputs a_(i-4) 150 and c_(i-3) 160 are observed alternatively at the trailing edges of even and odd clock cycles. The architecture is therefore 100% utilized since, at every clock cycle, one output, (either a_(i-4) or c_(i-3)) is being generated. Though not shown in FIG. 13, a multiplexer with a clock (alternating sequence) as a control signal may be used in other embodiments to observe the alternating outputs in the proper sequence.

The architecture of the unit 77 is “systolic”, i.e., repeated, in that each of the processing cells 121, 122, 124, 126 and 128 include one adder and one multiplier and may vary only in the value of the filter coefficients which may be stored in a programmable register or other memory.

Referring also to FIG. 14, given two filter coefficients, h and g, with h representing the high-pass coefficients and g representing the low pass coefficients, the intermediate output L_(k) is computed by the following expression: L_(k)=(p_(k)+q_(k))*h. Likewise, the intermediate output M_(k) is computed by cell 200 according to the following expression: M_(k)=(p_(k)+q_(k))*g. In the expressions for L_(k) and M_(k), the term q_(k) represents the input data x_(i) 100 which is the subject of the discrete wavelet transform (DWT), and the term p_(k-1) refers to the input data x_(i) 100 from the coupled processing cell from the previous clock cycle. The term p_(k) represents the input data the current clock cycle. The input p_(k) is passed through to output p_(k-1) from a cell D_(k) to the previous cell D_(k-1) in the array. Thus, the terms p_(k) and p_(k-1) may be referred to as “propagated inputs.”

The basic processing cell 200, shown in FIG. 14 may be repeatedly built and coupled to perform the forward DWT computation. The DWT may be advantageous for compression since it is a “pyramid” algorithm. In a pyramid algorithm, for a one-dimensional DWT, every other sample is thrown away or downsampled, and thus the data set is halved at each iteration. Thus, after J iterations of the FIG. 13 computation, the number of samples being manipulated shrinks by 2^(J). To compute a DWT of a signal of N samples, the CPU cycle time required would amount to only K*N, where K is a constant determined by the Wavelet chosen (i.e., chosen by the coefficients used).

The forward DWT computation may be represented by a_(n)=Σ_(K)h_(2n-K)x_(K) (LFS outputs) and c_(n)=Σ_(K)g_(2n-K)x_(K) (HFS outputs). The low-pass filter coefficients h_(i) and the high-pass filter coefficients g_(i) have certain symmetric properties which can be manipulated to implement the unit 77. The filters used for the DWT may be IIR (Infinite Impulse Response) or FIR (Finite Impulse Response) digital filters. For example, one type of filter may be a biorthogonal spline filter that has nine low-pass filter coefficients h⁻⁴, h⁻³, h⁻², h⁻¹, h₀, h₁, h₂, h₃, and h₄. The biorthogonal spline filter also has seven high-pass filter coefficients g⁻², g⁻¹, g₀, g⁻¹, g⁻², g₃, and g₄.

The LFS outputs are as follows: a ₀ =h ₀ x ₀ +h ⁻¹ x ₁ +h ⁻² x ₂ +h ⁻³ x ₃ +h ⁻⁴ x ₄, a ₁ =h ₂ x ₀ +h ₁ x ₁ +h ₀ x ₂ +h ⁻¹ x ₃ +h ⁻² x ₄ +h ⁻³ x ₅ +h ⁻⁴ x ₆, a ₂ =h ₄ x ₀ +h ₃ x ₁ +h ₂ x ₂ +h ₁ x ₃ +h ₀ x ₄ +h ⁻¹ x ₅ +h ⁻² x ₆ +h ⁻³ x ₇ +h ⁻⁴ x ₈, and continue to follow the below sequence: a _(n/2-2) =h ₄ x _(N-8) +h ₃ x _(N-7) +h ₂ x _(N-6) +h ₁ x _(N-5) +h ₀ x _(N-4) +h ⁻¹ x _(N-3) +h ⁻² x _(N-2) +h ⁻³ x _(N-1), a _(n/2-1) =h ₄ x _(N-6) +h ₃ x _(N-5) +h ₂ x _(N-4) +h ₁ x _(N-3) +h ₀ x _(N-2) +h ⁻¹ x _(N-1).

One property of the low-pass filter coefficients is that of symmetry such that h⁻¹=h₁. Thus, h⁻¹=h₁, h⁻²=h₂, h⁻³=h₃, and h⁻⁴=h₄

Thus, for example, a₁ may be rewritten as: a ₁=(h ₀ x ₂ +h ₁(x ₁ +x ₃)+h ₂(x ₀ +x ₄)+h ₃ x ₅ +h ₄ x ₆. Likewise, other LFS outputs may be conveniently re-arranged such that only one add and one multiply operation is required in each processing cell. The simplified LFS outputs after applying the symmetric filter properties for low-pass coefficients are as follows: a ₀ =h ₀ x ₀ +h ₁ x ₁ +h ₂ x ₂ +h ₃ x ₃ +h ₄ x ₄ =h ₀(x ₀+0)+h ₁(x ₁+0)+h ₂(x ₂+0)+h ₃(x ₃+0)+h ₄(x ₄+0) a ₁ =h ₂ x ₀ +h ₁ x ₁ +h ₀ x ₂ +h ₁ x ₃ +h ₂ x ₄ +h ₃ x ₅ +h ₄ x ₆ =h ₀(x ₂+0)+h ₁(x ₁ +x ₃)+h ₂(x ₀ +x ₄)+h ₃(x ₅+0)+h ₄(x ₆+0) a ₂ =h ₀(x ₄+0)+h ₁(x ₃ +x ₅)+h ₂(x ₂ +x ₆)+h ₃(x ₁ +x ₇)+h ₄(x ₀ +x ₈) a ₃ =h ₀(x ₆+0)+h ₁(x ₅ +x ₇)+h ₂(x ₄ +x ₈)+h ₃(x ₃ +x ₉)+h ₄(x ₂ +x ₁₀) a ₄ =h ₀(x ₈+0)+h ₁(x ₇ +x ₉)+h ₂(x ₆ +x ₁₀)+h ₃(x ₅ +x ₁₁)+h ₄(x ₄ +x ₁₂) The symmetry of the coefficients also reduces the total number of processing cells required so that an architecture like that if the unit 77 may be used to compute the forward DWT. At the fourth clock cycle, where i=3, a_(i-4) is ignored (skipped), and c₀ is instead observed after being computed, which is the first HFS output is described below.

At the fifth clock cycle, i=4, the c_(i-3) 160 output is skipped, and instead, the LFS output a_(a-4) 150 is instead observed. The term a₀ is computed as follows. At i=4, the fifth clock cycle, cell D₀ 121 receives x₄, D₁ 122 receives x₃ (from previous cycle), D₂ 124 receives x₂, D₃ 126 receives x₁ and D₄ 128 receives 0 as their respective q_(i) values. Also, at i=4, the propagated input pi for cell D₄ 128 is x₀. Since there is, by definition, no x⁻¹, x⁻², etc., cells D₁ 122 and D₀ 121 receive nulls, or 0, values at i=4. Using the basic formula L_(i)=(p_(i)+q_(i))*h for each processing cell, D₀ 121 provides the intermediate output L₀=(0+x₄)*h₄. Likewise, D₁ 122 generates L₁=(0+x₃)*h₃, D₂ 124 generates, L₂=(0+x₂)*h₂, D₃ 126 generates L₃=(0+x₁)*h₁ and D₄ 128 generates L₄=(x₀+0)*h₀. Adder 130 computes the sum of L₀, L₁, L₂, L₃ and L₄ which yields the first output a₀=x₀h₀+x₁h₁+x₂h₂+x₃h₃+x₄h₄.

In the case for i=4, D₃ 126 receives no propagated input from D₄ 128 since D₄ 128 received no propagated input q_(i) until i=4. Similarly, all the LFS outputs a_(i-4) 150 and c_(i-3) 160 may be computed. The processing cell 200 may also contain a latch register or other mechanism to hold propagated inputs before passing them to the next cell. One reasonably skilled in the art of digital design will readily be able to design/implement the add, multiply and delay elements required by the precision/value of the DWT inputs and outputs.

The HFS outputs are as follows: c ₀ =g ₀ x ₀ +g ⁻¹ x ₁ +g ⁻² x ₂, c ₁ =g ₂ x ₀ +g ₁ x ₁ +g ₀ x ₂ +g ⁻¹ x ₃ +g ⁻² x ₄, c ₂ =g ₄ x ₀ +g ₃ x ₁ +g ₂ x ₂ +g ₁ x ₃ +g ₀ x ₄ +g ⁻¹ x ₅ +g ⁻² x ₆, and continue to follow the below sequence: c _(n/2-2) =g ₄ x _(N-8) +g ₃ x _(N-7) +g ₂ x _(N-6) +g ₁ x _(N-5) +g ₀ x _(N-4) +g ⁻¹ x _(N-3) +g ⁻² x _(N-2), c _(n/2-1) =g ₄ x _(N-6) +g ₃ x _(N-5) +g ₂ x _(N-4) +g ₁ x _(N-3) +g ₀ x _(N-2) +h ⁻¹ x _(N-1), The high-pass coefficients are also symmetric, but in a different sense. The inverse DWT low-pass filter coefficients are represented by {overscore (h)}. These inverse coefficients are symmetric such that {overscore (h)}_(n)={overscore (h)}_(−n). The high-pass forward DWT filter coefficients have the property g_(n)=(−1){overscore (h)}_(1-n). Since {overscore (h)}_(n)={overscore (h)}_(−n), also g_(n)=(−1)^(n){overscore (h)}_(n-1). Thus, for n=2, g₂=h₁, but also for n=0, g₀=h₁. Therefore, g₂=g₀={overscore (h)}₁. Likewise, it can be shown that g⁻²=g₄={overscore (h)}₃ and g⁻¹=g₃={overscore (h)}₂. Substituting these equations for the expanded c_(n) yields, for instance, c₂=g₁x₃+g₂(x₂+x₄)+g₃(x₁+x₅)+g₄(x₀+x₆). Thus, only the coefficients g₁, g₂, g₃ and g₄ are required to computer the HFS outputs c_(i-3) 160. The symmetric property of biorthogonal spline filter coefficients allows a great reduction in computation and consequently in the architecture required.

The simplified HFS outputs after the symmetric filter properties for high-pass coefficients are as follows: c ₀ =g ₂ x ₀ +g ₃ x ₁ +g ₄ x ₂ =g ₂(x ₀+0)+g ₃(x ₁+0)+g ₄(x ₂+0) c ₁ =g ₂ x ₀+g₁ x ₁ +g ₀ x ₂ +g ⁻¹ x ₃ +g ⁻² x ₄ =g ₁(x ₁+0)+g ₂(x ₂ +x ₀)+g ₃(x ₃+0)+g ₄(x ₄+0) c ₂ =g ₄ x ₀ +g ₃ x ₁ +g ₂ x ₂ +g ₁ x ₃ +g ₀ x ₄ +g ⁻¹ x ₅ +g ⁻² x ₆ =g ₁(x ₃+0)+g ₂(x ₄ +x ₂)+g ₃(x ₅ +x ₁)+g ₄(x ₆ +x ₀) c ₃ =g ₄ x ₂ +g ₃ x ₃ +g ₂ x ₄ +g ₁ x ₅ +g ₀ x ₆ +g ⁻¹ x ₇ +g ⁻² x ₈ =g ₁(x ₅+0)+g ₂(x ₆ +x ₄)+g ₃(x ₇ +x ₃)+g ₄(x ₈ +x ₂) c ₅ =g ₄ x ₄ +g ₃ x ₅ +g ₂ x ₆ +g ₁ x ₇ +g ₀ x ₈ +g ⁻¹ x ₉ +g ⁻² x ₁₀ =g ₁(x ₇+0)+g ₂(x ₈ +x ₆)+g ₃(x ₉ +x ₅)+g ₄(x ₁₀ +x ₄)

The first clock cycle producing a valid HFS output is at i=3. At i=3, cell D₁ 122 receives as its q_(i) the input value x₂ (delayed at i=2 or delay element 110), D₂ 124 receives x₁ and D₃ 126 receives x₀. D₄ 128 always receives a q_(i) of 0. The propagated inputs p_(i) at i=3 are all 0. Therefore, according to the basic formula M_(i)=(p_(i)+q₁)*g, the intermediate outputs at i=3 are: M=x₂g₄, M₂=x₁g₃ and M₃=x₀g₂. The intermediate output M₄ is zero because both q_(i) and p_(i) inputs are 0 for cell C₄ 128. Adder 140 adds the intermediate puts M₁, M₂, M₃ and M₄ to obtain a c₀=x₀g₂+x₁g₃+x₂g₄, which matches the first term for the simplified HFS output. At i=4, as mentioned when discussing the LFS output, the HFS output is ignored, not observed and instead, at i=5, the next HFS output c₁ is observed. TABLE 1 below summarizes the intermediate outputs L and M, and final outputs c_(i-4) and a_(i-3) at each clock cycle for the forward DWT in the first eight clock cycles (i = 0 to 7) and matches the above equations for the simplified outputs. i (L₀M₀) (L₁M₁) (L₂M₂) (L₃M₃) (L₄M₄) a_(i-4) 150 c_(i-3) 160 0 (h₄x₀, 0) (0,0) (0,0) (0,0) (0,0) Not valid Not valid 1 (h₄x₁, 0) (h₃x₀, g₄x₀) (0,0) (0,0) (0,0) Not valid Not valid 2 (h₄x₂, 0) (h₃x₁, g₄x₁) (h₂x₀, g₃x₀) (0,0) (0,0) Not valid Not valid 3 (h₄x₃, 0) (h₃x₂, g₄x₂) (h₂x₁, g₃x₁) (h₁x₀, g₂x₀) (0,0) Not valid c₀ 4 (h₄x₄, 0) (h₃x₃, g₄x₃) (h₂x₂, g₃x₂) (h₁x₁, g₂x₁) (h₀x₀, g₁x₀) a₀ Not observed 5 (h₄x₅, 0) (h₃x₄, g₄x₄) (h₂x₃, g₃x₃) (h₁x₂x₀, g₂x₂x₀) (h₀x₁, g₁x₁) Not c₁ observed 6 (h₄x₆, 0) (h₃x₅, g₄x₅) (h₂x₄x₀, g₃x₄x₀) (h₁x₃x₁, g₂x₃x₁) (h₀x₂, g₁x₂) a₁ Not observed 7 (h₄x₇, 0) (h₃x₆x₀, g₄x₆x₀) (h₂x₅x₁, g₃x₅x₁) (h₁x₄x₂, g₂x₄x₂) (h₀x₃, g₁x₃) Not c₂ observed

Referring to FIG. 15, for the inverse transform, the reconstruction of the sequence x_(i) may be represented by the expression x_(i)=Σ_(n)[{overscore (h)}_(2n-i)a_(n)+{overscore (g)}_(2n-i)c_(n)]=Σ_(n){overscore (h)}_(2n-i)a_(n)+Σ_(n){overscore (g)}_(2n-i)c_(n), where a_(n) represents the LFS outputs of the forward DWT and c_(n) represents the HFS outputs. The inverse DWT has inverse high pass filter coefficients {overscore (g)} and inverse low pass filter coefficients {overscore (h)}.

The x_(i) reconstruction may be split into a sum of two summations: x _(i) ⁽¹⁾=Σ_(n) {overscore (h)} _(2n-i) a _(n) , x _(i) ⁽²⁾=Σ_(n) {overscore (g)} _(2n-i) c _(n). The even terms of x_(i) ⁽¹⁾ and x_(i) ⁽²⁾, i.e. x_(2j) ⁽¹⁾ and x_(2j) ⁽²⁾ for j=0, 1, . . . , n/2-1 using these filter coefficients are expanded as: x _(o) ⁽¹⁾ ={overscore (h)} _(o) a ₀ +{overscore (h)} ₂ a ₁ , x ₀ ⁽²⁾ ={overscore (g)} ₀ c ₀ +{overscore (g)} ₂ c ₁ +{overscore (g)} ₄ c ₂, x ₂ ⁽¹⁾ ={overscore (h)} ⁻² a ₀ +{overscore (h)} ₀ a ₁ +{overscore (h)} ₂ a ₂ , x ₂ ⁽²⁾ ={overscore (g)} ⁻² c ₀ +{overscore (g)} ₀ c ₁ +{overscore (g)} ₂ c ₂ +{overscore (g)} ₄ c ₃, x ₄ ⁽¹⁾ ={overscore (h)} ⁻² a ₁ +{overscore (h)} ₀ a ₂ +{overscore (h)} ₂ a ₃ , x ₂ ⁽²⁾ ={overscore (g)} ⁻² c ₁ +{overscore (g)} ₀ c ₂ +{overscore (g)} ₂ c ₃ +{overscore (g)} ₄ c ₄, and continue pursuant to the sequence described below: x _(n-6) ⁽¹⁾ ={overscore (h)} ⁻² a _(n/2-4)+{overscore (h)}₀ a _(n-3) +{overscore (h)} ₂ a _(n/2-2) , x _(n-6) ⁽²⁾ ={overscore (g)} ⁻² c _(n/2-4) +{overscore (g)} ₀ c _(n/2-3) +{overscore (g)} ₂ c _(n/2-2) +{overscore (g)} ₄ c _(n/2-1), x _(n-4) ⁽¹⁾ ={overscore (h)} ⁻² a _(n/2-3) +{overscore (h)} ₀ a _(n/2-2) +{overscore (h)} ₂ a _(n/2-1) , x _(n-4) ⁽²⁾ ={overscore (g)} ⁻² c _(n/2-3) +{overscore (g)} ₀ c _(n/2-2) +{overscore (g)} ₂ c _(n/2-1), x _(n-2) ⁽¹⁾ ={overscore (h)} ⁻² a _(n/2-2) +{overscore (h)} ₀ a _(n/2-1) , x _(n-2) ⁽²⁾ ={overscore (g)}−2 c _(n/2-2) +{overscore (g)} ₀ c _(n/2-1),

The inverse filter coefficients like their forward counterparts have certain symmetric properties allowing some associative grouping. One property of the inverse high-pass coefficients is {overscore (g)}_(n)=(−1)^(n)h_(1-n). Since h_(n)=h_(−n), the inverse high-pass coefficients also have a property such that {overscore (g)}_(n)=(−1)^(n)h_(n-1). Thus, for n=0, ({overscore (g)}₀=h⁻¹). For n=2, since {overscore (g)}₂=h₁ and h⁻¹={overscore (g)}₀, {overscore (g)}₂={overscore (g)}₀. Likewise, for n=4, {overscore (g)}₄=h₃={overscore (g)}⁻². As discussed above, the inverse low-pass coefficients have the property {overscore (h)}_(n)={overscore (h)}_(−n), such that {overscore (h)}₂={overscore (h)}⁻². Thus, the even-numbered outputs x_(2j) require only four coefficients, {overscore (h)}₀, {overscore (h)}₂, {overscore (g)}₂ and {overscore (g)}₄ in order to be computed. Similarly, for odd-numbered outputs x_(2j-1), it can be shown by the same filter properties discussed above that only five coefficients, {overscore (h)}₃, {overscore (h)}₁, {overscore (g)}₅, {overscore (g)}₃ and {overscore (g)}₁ are required.

The equations above show the complexity of the inverse DWT computation. The architecture for computing the inverse DWT consists of two input sequences a_(i) and c_(i), which represent the high-frequency sub-band and the low-frequency sub-band, respectively. Comparing FIG. 14 with FIG. 13, the DWT architecture is necessarily asymmetric since the forward architecture receives one input and produces two output sequences, whereas the inverse architecture receives two inputs and produces one output. This architecture is further complicated by the fact that odd-numbered outputs, i.e., x₁, x₃, x₅ . . . , require five processing cells, one for each coefficient, whereas the even-numbered outputs, i.e., x₀, x₂, x₄ . . . , require only four processing cells. Moreover, the data flow for odd and even terms are not symmetric. The odd-numbered outputs are generated only odd clock cycles while even-numbered outputs are generated only on even clock cycles.

Thus, the inverse DWT architecture typically includes two distinct blocks—an even output generating block 300 and an odd output generating block 350, as shown in FIG. 15. The even output generating block 300 further includes two subcircuits—an even high frequency sub-band subcircuit (HFS) 310 and an even low frequency sub-band subcircuit (LFS) 320. Even HFS subcircuit 310 consists of two processing cells 315 and 317 each of which are composes of a multiplier and an adder. The processing cells 315, 317, 325 and 327 operate similarly to the basic processing cell 200 shown in FIG. 14 except that each needs only one coefficient and consequently computes one intermediate rather than two. For instance, processing cell 315 outputs a term such that as is first added to the propagated input from processing cell 317, with that sum multiplied by {overscore (h)}₂. Likewise for low frequency sub-band circuit 320, processing cell 325 outputs a term to adder/controller 330 which is the product of {overscore (g)}₄ and the sum of the input c_(i) and the propagated input from processing cell 327. Processing cell 317 receives as one input 0 and as the other input a_(i-1) since delay element 312 holds the value given it on the previous clock, transmitting it on the next clock cycle.

Even output generating block operates as follows. At i=0, a₀ is propagated to delay 312, and c₀ to delay 322. Though a₀ and c₀ are also input to cells 315 and 325, respectively, adder/controller 330 waits until the third clock cycle to output x₀ and have non-zero propagated inputs. At i=0, cells 317 and 327 have outputs of 0, since initial values released by delays 312, 324 and 322 are set at zero. At i=1, delay 312 releases a₀ to the p_(i) input of cell 317 and a₁ is held at delay 312 and input to cell 315. As a result, cell 315 generates the term {overscore (h)}₂a₁ and cell 317 generates {overscore (h)}₀a₀. These outputs are sent to adder/controller 330 but are held (latched) until the next clock i=2. At i=1, though cells 325 and 327 generate the terms c₁ {overscore (g)}₄ and c₀ {overscore (g)}₂, respectively, these terms are ignored (cleared) by adder/controller 330 since according to the reconstruction formula, the first output x₀ requires a c₂ term.

At i=2, the third clock cycle, delay 324 releases c₀ to the p_(i) (propagated) input of cell 327 and delay 322 releases c₁ to the q_(i) input of cell 327. Thus, cell 327 generates the term (c₁+c₀) {overscore (g)}₂. Cell 325 generates c₂ {overscore (g)}₄. As described above, the outputs of cells 315 and 317 from the previous clock were held at adder/controller 330 and are summed, at i=2 with the terms generated by cells 325 and 327. Again, at i=2, even though cells 315 and 317 generate (a₀+a₂) {overscore (h)}₂ and a₁h₀, respectively, these terms are held one clock cycle. Instead, the i-2 outputs of cells 325 and 327 which are c₂ {overscore (g)}₄ and (c₀+c₁)*{overscore (g)}₂, respectively, are summed with the i=1 outputs of cells 315 and 317 which are {overscore (h)}₀a₀ and {overscore (h)}₀a₀ and {overscore (h)}₂a₁. Hence, adder/controller 330 generates the first output x₀={overscore (h)}₀a₀+{overscore (h)}₂a₁+c₀{overscore (g)}₂+c₁{overscore (g)}₂c₂{overscore (g)}₄.

Thus, for each clock cycle i, after i=2 (the third clock cycle), adder/controller 330 receives current outputs of subcircuit 320 and adds them to the previous clock's outputs from subcircuit 310. Additionally, adder/controller 330 receives the current output of subcircuit 310, holding them until the next clock cycle.

FIG. 15 also shows an odd output generating block 350 which requires five processing cells—365, 367, 375, 377 and 379. The processing cells 365, 367, 375, 377 and 379 operate similarly to the processing cell 200 shown in FIG. 14. The delay elements 362, 364, 372 and 374 hold their inputs for one clock cycle and release them on the next clock cycle. Each cell has an adder and multiplier and receives a propagated input from the cell to which it is connected.

Odd output generating block 350 operates as follows. At i=0, a₀ is propagated to cell 365 and is held one clock cycle at delay 362, while cell 375 receives c₀. At i=1, delay 362 releases a₀ to cell 367, while delay 372 releases c₀ to cell 377. Also, at i=1 a₁ and c₁ are input to cells 365 and 375, respectively. At i=2, cell 365 receives a₂, cell 367 receives a₁ as its q_(i) and receives a₀ as its p_(i) input. Thus, cell 365 generates a term a₂{overscore (h)}₃ and cell 367 generates (a₁+a₀){overscore (h)}₁. These outputs are sent to adder/controller 380 but are held for one clock cycle before being summed with the outputs of cells 375, 377 and 379. At i=2, the outputs of cells 375, 377 and 379 are ignored by adder/controller 380.

At i=3, c₃ is input to cell 375, cell 377 receives c₂ from the delay 372, cell 379 receives as propagated input c_(i), and cell 377 receives as its propagated input c₀. Thus, cell 375 generates the term c₃{overscore (g)}₅, cell 377 generates the term (c₀+c₂)*{overscore (g)}₃, and cell 379 generates c₁{overscore (g)}₁. These outputs are received by adder/controller 380 which adds the i=3 outputs of cells 375, 377 and 379 with the latched, i=2 outputs of cells 367 and 367 from the previous clock cycle. Hence, adder/controller 380 generates the second output (the first odd output) x₁={overscore (h)}₁a₀+{overscore (h)}₁a₁+{overscore (h)}₃a₂+{overscore (g)}₃c₀+{overscore (g)}₃c₂+{overscore (g)}₅c₃+{overscore (g)}₁c₁.

Thus, for each clock cycle i, after i=3, (the fourth clock cycle), adder/controller 380 receives the current outputs of cells 375, 377 and 379 and adds them to the previous clock cycle's outputs of cells 365 and 367. Additionally, adder/controller 380 receives the current clock's outputs of cells 365 and 367 holding them until the next clock cycle.

FIGS. 13 and 15 show the complexity necessary to compute the DWT and inverse DWT. In an image processing chip, for instance, the circuitry should be capable of both the forward and inverse DWT. In addition to implementing biorthogonal spline filters for reducing and simplifying the forward and inverse DWT as shown in FIGS. 13 and 15, the invention also provides for an integrated architecture performing both.

Referring to FIG. 16, in some embodiments, a single integrated systolic architecture may be used for both the forward Discrete Wavelet Transform. The expressions described for the inverse DWT and forward DWT above are implemented in one integrated architecture. Thus, the description of FIG. 16 will concentrate on the architecture rather than mathematical properties which make an integrated architecture possible, since these properties have already been described.

FIG. 16 shows four control signals, a clocking signal 401, a selector signal I₁, selector signal I₂ and signal I₃. O₁ and O₂ are generated by an adder module and control circuit 480. Five processing cells—410, 420, 430, 440 and 450—each have an adder and multiplier (and registers storing the two coefficients). Each cell has the coefficients shown stored in registers or other such devices (e.g., {overscore (h)}₃ and {overscore (h)}₂ for cell 410). Input line L is coupled via multiplexers and delays to each of the processing cells. The integrated architecture has a delay element 402, multiplexer (MUX) 472, delay element 404, MUX 476, MUX 470, delay element 406, MUX 471 and delay element 408 coupled to line 400. Also shown are a MUX 478, MUX 479, MUX 474, an input line 400, and two output lines O₁ and O₂.

The control line I₁ and I₂ are both at a low logic level when the architecture is to function as a forward DWT circuit. During the forward DWT, the input line 400 is sending values corresponding to input samples prior to decomposition. Delay elements 402, 404, 406 and 480 hold samples from the input line 400 for one clock and then release then on the next clock. For MUX 478 and 470, the select line/control signal I₁. When I₁=0, MUX 470 selects line 400 rather than the input c_(i) and MUX 478 selects the propagated input from cell 430. For MUX 472, 474, MUX 476, MUX 479 and MUX 471, I₃ is the select line or control signal.

When I₃ is low (0), MUX 472 selects input line 400, MUX 474 selects the propagated input from cell 420, MUX 476 selects the input from the output of MUX 478, MUX 479 selects the propagated input output by cell 440 and MUX 471 the propagated input output by cell 450. In the forward DWT mode, therefore, the circuit functions similarly to unit 77. Even though there are nine (both high-pass and low-pass), forward DWT filter coefficients and nine inverse DWT filter coefficients, due to the symmetry properties discussed above, the architecture can be simplified to a total of nine coefficients (both low and high pass). With only nine coefficients, only five basic processing cells such as that shown in FIG. 14, are needed to compute the forward and inverse DWT on the integrated architecture.

In the inverse DWT computation, as described above, two input sequences a_(i) and c_(i) generate one output sequence x_(i) which is the reconstructed original sample. However, odd and even numbered outputs are computed separately, and observed on alternating clock cycles. Thus, in FIG. 16, the output x_(2j+1), representing the odd-numbered outputs is observed on the 0 _(i) terminal, and the even-numbered outputs are observed on the 0 ₂ terminal. For the inverse DWT, I₁ and I₂ are both set to one. When the clock cycle is low, I₃, after a certain delay, is also low (since I₂ is high, the output of AND gate 490 (I₃) is also high, subject to the propagation delay through gate 490).

When I₃ is high, MUX 471 selects input line 400, MUX 479 selects the propagated input output by cell 450, MUX 476 selects the input line 400 as does MUX 474, and MUX 472 selects 0. In the inverse DWT mode, line 400 now propagates a_(i-1) rather than x_(i). In inverse DWT mode, I₁=1 (is always high), and as a result, MUX 470 selects the input c_(i) and MUX 478 selects the input line 400 (after delay element 404). Utilizing these control signals and multiplexers, the architecture of FIG. 14 is capable of functioning as an inverse DWT (see FIG. 13) or a forward DWT (see FIG. 15). No complex “scheduling” of input and intermediate outputs is required as in traditional architectures and for every clock, an output is produced at terminals 0 ₁ and 0 ₂.

Referring to FIGS. 17, 18, 19, 20, 21 and 22, the I₁ control signal is set to 0 for all clock cycles during the forward DWT mode. Also, the I₂ control signal is set to 0 for all clock cycles in the forward DWT mode. A clock signal is generated by a clocking mechanism wherein the first half of the first clock cycle is represented by an i value of 0 (low state) and the second half of the first clock cycle by an i value of 1 (high state). The first half of the second clock cycle is represented by an i value of 2. The clock signal is input to an AND gate along with the I₂ signal which generates an output signal I₃. Since I₂ is always 0 for all clock cycles, I₃ is also always 0 during the forward DWT mode. Output terminal 0 ₁ does not show a valid output until the first half of the third clock cycle, when i=4, whereupon 0 ₁ returns a value a₀. In the second half of the third clock cycle, at i=5, the output terminal 0 ₁ does not return a valid value for the output terminal 0 ₂ returns a valid value for c₁. Neither of the output terminals 0 ₁ or 0 ₂ return a valid value until i=3, whereupon output terminal 0 ₂ returns c₀. Starting with the second half of the second clock cycle (i=3), the output terminals 0 ₁ and 0 ₂ alternatively produce valid values with 0 ₁ producing the sequence a_(n) and 0 ₂ producing the sequence c_(n).

Referring to FIGS. 23, 24, 26, 26, 27 and 28, in the inverse DWT mode, the control signal I₁ is always set to 1 as is the control signal I₂. I₂ and the clocking signal are both input to an AND gate to generate the control signal I₃. Due to the delay of propagating through the AND gate, the signal I₃ is not coincident with the clocking signal, but is slightly delayed on its trailing edge. Since the even and add reconstructed output sequences computed from the inverse DWT are very different in characteristic (see FIG. 13 and associated description) the odd numbered outputs x₁, x₃, x₅, etc. are observed on output terminal O₁, whereas the even numbered output sequences x₀, x₂, x₄, etc. are observed on output terminal O₂. When I₃ is high, the output terminal O₂ generates the even numbered output terms for the sequence for the reconstructed input sequence and, when I₃ is low, the output terminal O₁ generates the odd numbered outputs of the reconstructed input sequence. As described earlier, the first valid output x₀ is observed on terminal O₂ only starting with the second half of the second clock cycle, where i=3. On the next half clock cycle at i=4, output terminal O₁ generates a valid term x₁, while the output terminal O₂ produces an invalid value. Thus, on alternating half clock cycles starting with i=3, the output terminals O₁ and O₂ combined generate the entire reconstructed input sequence x_(n).

Referring to FIG. 29, the two-dimensional integrated module 700 is the extension of the one-dimensional integrated architecture of FIG. 16 for a two-dimensional case, such as an image compression. Within the integrated module 700 a row-wise one-dimensional integrated module 710 and a column-wise one-dimensional integrated module 730. Each of these integrated modules operate similar to the integrated architecture shown in FIG. 13, in that they are both capable of constructing a forward Wavelet Transform for a single input sequence x and the inverse Discrete Wavelet Transform for reconstructing the input sequence x based upon high-frequency sub-band inputs and low frequency sub-band inputs. A transpose circuit 720 is coupled between the integrated module 710 and the integrated module 730 to rearrange the data by transporting the matrix.

In the forward DWT case, the input 750 is a single input sequence which is indexed by x and y coordinates. In the forward case, this input 750 is transformed by the two-dimensional integrated module 700 into an output 755 which consists of a low frequency sub-band output sequence A which is indexes by its x and y coordinates (shown as A_(x) and A_(y) in FIG. 29) and a high frequency sub-band output sequence C also indexed by its x and y coordinates (shown as C_(x) and C_(y) in FIG. 29). In the inverse DWT case, the integrated module 700 receives via input 750 a low frequency sub-band input sequence A indexed by its x and y coordinates and a high frequency sub-band input sequence C also indexed by its x and y coordinates. This input 750 in the inverse DWT case generates an output 755 which is a single output sequence which consists of the reconstructed input sequence indexed by both its x and y coordinates. For image compression, the forward DWT case corresponds to the compression of the image while the inverse DWT case corresponds to the decompression of the image.

Referring to FIG. 30, a flowchart of the basic method of integrating forward and inverse DWT in a single architecture includes determining (block 800) whether a first mode, the forward DWT, is required or the second mode, the inverse DWT, is required. In an image processing application, some switch/toggle may perform this selection based on whether compression or decompression is to be performed.

If the first mode is selected, the architecture will begin to perform the forward DWT (block 810). For the forward DWT mode, a first control signal I₁ is set low or a value of 0 (block 811). A second control signal I₂ is also set low or a value of 0 (block 812). Next, the second control signal is propagated as one input of a two input AND gate, while the clocking signal itself is the other input. This generates a third control signal I₃ (block 814). The first and third control signals, I₁ and I₃ are provided to multiplexers to select the appropriate inputs of those multiplexers when computing the forward DWT (block 816). Finally, according to block 818, the forward DWT is computed to completion.

If the second mode is selected, the architecture begins to perform the inverse DWT (block 820). For the inverse DWT mode, a first control signal I₁ is set high or a value of 1 (block 821). A second control signal I₂ is also set high or a value of 1 (block 822). Next, the second control signal is propagated as one input of a two input AND gate, while the clocking signal itself is the other input. This generates a third control signal I₃ (block 824). The first and third control signals, I₁ and I₃ are provided to multiplexers to select the appropriate inputs of those multiplexers when computing the inverse DWT (step 826). Finally, according to block 828, the inverse DWT is computed to completion.

The above flowchart shows that by multiplexing inputs and controlling the selection of the multiplexed inputs depending on the mode selected, a single architecture can be used to compute both the forward and inverse DWT. Not shown in the flowchart of FIG. 30, is that the computing steps 818 and 828 may utilize biorthogonal spline filter coefficients as a basis for computing the forward and inverse DWT, respectively. Also, symmetry properties of these filter coefficients are used to reduce the output equations for compute both inverse and forward DWT, with these coefficients being supplied to processing cells of the architecture. The multiplexers allow selective coupling of the processing cells in order to compute the forward and inverse DWT.

While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of the invention. 

1. A method comprising generating first difference frames; compressing the first difference frames to form compressed difference frames; decompressing the compressed difference frames to form decompressed difference frames; and using the decompressed difference frames in the generation of the first difference frames: 2-22. (canceled)
 23. A method comprising in a feedback loop, generating difference frames in response to decompression of the difference frames.
 24. The method of claim 23, further comprising: filtering the difference frames.
 25. The method of claim 23, further comprising: quantizing data associated with the difference frames.
 26. The method of claim 23, further comprising: dequantizing data associated with compressed difference frames.
 27. The method of claim 26, wherein the compressed difference frames include data indicative of frequency sub-band images.
 28. The method of claim 23, further comprising: entropy encoding data associated with the difference frames.
 29. The method of claim 23, further comprising: serially transmitting data associated with the difference frames to a computer.
 30. A system comprising: a pixel sensor; and a circuit to in a feedback loop, generate difference frames indicative of images captured by the pixel sensor in response to decompression of the difference frames.
 31. The system of claim 30, wherein the circuit comprises: a transformation circuit to spatially filter the difference frames to generate data indicative of frequency sub-band images.
 32. The system of claim 30, wherein the circuit comprises: a quantization circuit to quantize data associated with the difference frames.
 33. The system of claim 30, wherein the circuit comprises: a compression circuit to compress the difference frames to form compressed difference frames; and a dequantization circuit to dequantize data associated with the compressed difference frames.
 34. The system of claim 33, wherein the compressed frames include data indicative of frequency sub-band images, the circuit further comprising: an inverse transformation circuit to use the data to generate the decompressed frames.
 35. The system of claim 30, wherein the circuit comprises: an entropy encoder circuit to encode data associated with the difference frames.
 36. A camera comprising: a pixel sensor array to generate captured frames indicative of captured pixel images; a first circuit to generate difference frames indicative of images captured by the pixel sensor array in response to the captured images and decompression of the difference frames; and an interface circuit to communicate signals indicative of the difference frames to circuitry external to the camera.
 37. The camera of claim 36, wherein the first circuit comprises: a transformation circuit to spatially filter the difference frames to generate data indicative of frequency sub-band images.
 38. The camera of claim 37, wherein the circuit further comprises: a quantization circuit to quantize the data.
 39. The camera of claim 36, wherein the first circuit comprises: an entropy encoder circuit to encode the difference frames.
 40. The camera of claim 36, wherein the circuitry external to the camera comprises a serial bus. 